Basketball: how likely is it to score?
When playing basketball we can ask ourself: how likely is it to score given a position in the court?
We use data from NBA games from the season 2006 - 2007. We will consider all types of shots.
x
beginusing CSVusing DataFramesusing StatsPlotsseason_2007 = CSV.read("./data/seasons/shots_season_2006_2007.csv");#shots = vcat(season_2007, season_2008, season_2009);shots = season_2007shots[:distance] = sqrt.( shots.x .^ 2 + shots.y .^ 2);shots[:angle] = atan.( shots.y ./ shots.x );filter!(x-> x.distance>1, shots)end;But how we interpret the data?
In the sketch below we show the drawing of a basketball court, its dimensions and how to interpret the data in the table.
xxxxxxxxxxbeginusing Imagesimg = load("./images/basket_court.png")endSo, the x and y have their origin at the hoop, and we compute the distance from this point to where the shot was made. Also, we will compute the angle with respect to the x axis, showed as θ int he previous sketch.
We now plot where the shots where made:
We see that the shots are very uniformly distributed near the hoop, except for distances very near to the hoop, to see this better, we plot the histograms for each axis, x and y.
But we are interested in the shots that were scored, so we filter now the shots made and plot the histogram of each axis.
xxxxxxxxxxshots_made = filter(x->x.result==1, shots);If we plot a 3d plot of the count we obtain the plot wireplot shown below.
xxxxxxxxxxbegin using StatsBase h = fit(Histogram, (shots_made.x, shots_made.y), nbins=40) plot(h) # same as histogram2d wireframe(midpoints(h.edges[2]), midpoints(h.edges[1]), h.weights, zlabel="counts", xlabel="y", ylabel="x", camera=(40,40))title!("Histogram of shots scored")endThe first model we are going to propose is a Bernoulli model.
Why a Bernoulli Distribution?
A Bernoulli Distribution results from a experiment where we have 2 possible outcomes, one that we usually called a success and another called a fail. In our case our sucess is scoring the shot and the other possible event is failing it.
The only parameter in the bernoulli distribution is the probability p of having a success. We are going to model this parameter as a logistic function:
xxxxxxxxxxbeginusing Turingusing StatsFuns: logisticendWhy a logistic function?
We are going to model the probability of shoot as a function of some variables, for example the distance to the hoop, and we want that our probability of scoring increases as we are getting closer to it. Also out probability needs to be between 0 an 1, so a nice function to map our values is the logistic function.
So, the model we are going to propose is:
But what values and prior distributions are we going to propose to a, b and c?
Let's see:
Prior Predictive Checks: Part I
Suppose we say that our prior distributions for a, b and c are going to be 3 gaussian distributions with mean 0 and variance 1. Lets sample and see what are the possible posterior distributions:
We see that some of the predicted behaviours for p don't make sense. For example, for positive values of b, we are saying that as we increase our distance from the hoop, the probability of scoring also increase. So we propose instead the parameter b to be the negative values of a LogNormal distribution. The predicted values for p are shown below.
So,m our model now have as prior:
and sampling values for the above prior distribution and we obtain the plot shown below for the predicted values of p.
xxxxxxxxxxbeginb_prior_sampling_negative = rand(LogNormal(1,0.25), n_samples)predicted_p_inproved = []for i in 1:n_samples push!(predicted_p_inproved, logistic.(a_prior_sampling[i] .- b_prior_sampling_negative[i].*possible_distances))end endNow that we have the expected behaviour for p, we define our model and calculate the posterior distributions with our data points.
Defining our model and computing posteriors
Now we define our model to sample from it:
logistic_regression (generic function with 1 method)x
logistic_regression(distances, angles, result,n) = begin N = length(distances) # Set priors. a ~ Normal(0,1) b ~ LogNormal(1,0.25) c ~ Normal(0,1) for i in 1:n p = logistic( a - b*distances[i] + c*angles[i]) result[i] ~ Bernoulli(p) endendChains MCMC chain (1500×12×3 Array{Float64,3}):
Iterations = 1:1500
Thinning interval = 1
Chains = 1, 2, 3
Samples per chain = 1500
parameters = a, b, c
internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat
Symbol Float64 Float64 Float64 Float64 Float64 Float64
a 0.2233 0.1350 0.0020 0.0023 3648.1578 1.0000
b 1.7963 0.2827 0.0042 0.0045 4701.3859 1.0000
c -0.0358 0.1205 0.0018 0.0062 362.7575 1.0080
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
a -0.0113 0.1370 0.2236 0.3083 0.4751
b 1.2820 1.6064 1.7866 1.9737 2.3572
c -0.1805 -0.0820 -0.0312 0.0160 0.1089
x
# Sample using HMC.chain = mapreduce(c -> sample(logistic_regression(shots.distance[1:n] ./ maximum(shots.distance[1:n] ), shots.angle[1:n], shots.result[1:n], n), HMC(0.05, 10), 1500), chainscat, 1:3)x
plot(chain, dpi=60)x
begin a_mean = mean(chain[:a]) b_mean = mean(chain[:b]) c_mean = mean(chain[:c])end;Now plotting the posterior distribution for an angle of 45°, we have:
x
beginp_constant_angle = [] for i in 1:length(chain[:a]) push!(p_constant_angle, logistic.(chain[:a][i] .- chain[:b][i].*possible_distances .+ chain[:c][i].*π/4)); end endThe plot shows that the probability of scoring is higher as our distance to the hoop decrease, which makes sense.
x
beginp_constant_distance = [] for i in 1:length(chain[:a]) push!(p_constant_distance, logistic.(chain[:a][i] .- chain[:b][i].*0.5 .+ chain[:c][i].*possible_angles)); end endWe see that the model predict a almost constant probability for the angle.
New model and prior predictive checks: Part II
Now we propose another model with the form:
But for what values of b the model makes sense?
x
begin f1(x) = 0.3^x f2(x) = 1.5^x f3(x) = -0.3^x f4(x) = -1.5^xend;Analysing the possible balues for b, the one that makes sense is f1, since we want increasing influence of the distance in the values of p as the distance decreases, since the logistic function has higher values for higher values of x.
Defining the new model and computing posteriors
logistic_regression_exp (generic function with 2 methods)x
logistic_regression_exp(distances, angles, result,n) = begin N = length(distances) # Set priors. a ~ Normal(0,1) b ~ Beta(2,5) c ~ Normal(0,1) for i in 1:n p = logistic( a + b .^ distances[i] + c*angles[i]) result[i] ~ Bernoulli(p) endendChains MCMC chain (1500×12×3 Array{Float64,3}):
Iterations = 1:1500
Thinning interval = 1
Chains = 1, 2, 3
Samples per chain = 1500
parameters = a, b, c
internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat
Symbol Float64 Float64 Float64 Float64 Float64 Float64
a -0.9032 0.1478 0.0022 0.0088 267.9893 1.0052
b 0.1315 0.1070 0.0016 0.0065 264.3724 1.0033
c -0.0423 0.1234 0.0018 0.0060 391.2129 1.0063
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
a -1.1876 -1.0049 -0.9045 -0.8012 -0.6231
b 0.0068 0.0482 0.1052 0.1857 0.4141
c -0.1873 -0.0816 -0.0347 0.0146 0.1060
xxxxxxxxxx# Sample using HMC.chain_exp = mapreduce(c -> sample(logistic_regression_exp(shots.distance[1:n] ./ maximum(shots.distance[1:n] ), shots.angle[1:n], shots.result[1:n], n), HMC(0.05, 10), 1500), chainscat, 1:3)xxxxxxxxxxplot(chain_exp, dpi=60)x
beginp_exp_constant_angle = [] for i in 1:length(chain_exp[:a]) push!(p_exp_constant_angle, logistic.(chain_exp[:a][i] .+ chain_exp[:b][i].^possible_distances .+ chain_exp[:c][i].*π/4)); end endxxxxxxxxxxbeginplotly()plot(dist_grid, rad2deg.(angle_grid), z, color=:blue, zlabel="Probability", legend=false, camera=(10,0)) xlabel!("Normalized distance")ylabel!("Angle [deg]") title!("Mean Probability of scoring from posterior sampled values")endDoes the Period affect the probability of scoring?
Now we will try to answer this question. We propose then a model, with parameter for each one of the four possible periods.
logistic_regression_period (generic function with 1 method)x
logistic_regression_period(distances, angles, result, period, n) = begin N = length(distances) # Set priors. a ~ filldist(Normal(0,1), 4) b ~ filldist(Beta(2,5), 4) c ~ filldist(Normal(0,1), 4) for i in 1:n p = logistic( a[period[i]] + b[period[i]] .^ distances[i] + c[period[i]]*angles[i]) result[i] ~ Bernoulli(p) endend